Building Corpora for the Development of a Dependency Parser for Spanish Using Maltparser
نویسندگان
چکیده
The present paper details the process followed for creating training and test corpora for a dependency parser generator (Maltparser). The starting point is the Cast3LB corpus, which contains constituency analyses of Spanish texts. These constituency analyses are automatically transformed into dependency analyses. In addition, the empirically and semiautomatically obtention of a set of syntactic function labels for the training corpus is described. As a result of the process followed, it has been obtained a dependency parser for Spanish showing a 91% precision when determining dependencies.
منابع مشابه
Comparing Rule-Based and Data-Driven Dependency Parsing of Learner Language
We explore the performance of two dependency parsing approaches, the rulebased WCDG approach (Foth and Menzel 2006) and the data-driven dependency parser MaltParser (Nivre et al. 2007) on texts written by language learners. We show that WCDG outperforms MaltParser in identifying the main functorargument relations, whereas MaltParser is more successful than WCDG in establishing optional, adjunct...
متن کاملMaltParser: A Data-Driven Parser-Generator for Dependency Parsing
We introduce MaltParser, a data-driven parser generator for dependency parsing. Given a treebank in dependency format, MaltParser can be used to induce a parser for the language of the treebank. MaltParser supports several parsing algorithms and learning algorithms, and allows user-defined feature models, consisting of arbitrary combinations of lexical features, part-of-speech features and depe...
متن کاملImproving parsing Accuracy for Spanish using Maltparser∗ Mejora de la Precisión del Análisis para el Español con Maltparser
In the last years, dependency parsing has been accomplished by machine learning–based systems showing great accuracy but usually under 90% for Labelled Attachment Score (LAS). Maltparser is one of such systems. Machine learning allows to obtain parsers for every language having an adequate training corpus. Since generally such systems can not be modified the following question arises: Can we be...
متن کاملA Data-Driven Dependency Parser for Bulgarian
One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first robust data-driven parser for Bulgarian, trained and evaluated on data from BulTreeBank (Simov et al., 2002). The parser uses dependency-based representa...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 39 شماره
صفحات -
تاریخ انتشار 2007